CHAPTER 20 Getting the Hint from Epidemiologic Inference 295
covariate, it does not meet the rules, so you leave it out. You keep doing this
until you run out of variables. Although forward stepwise can work if you have
very few variables, most analysts do not use this approach because it has been
shown to be sensitive to the order you choose in which to enter variables.»
» Backward elimination: In this approach, the first model you run contains all
your potential covariates, including all the confounders and the exposure.
Using modeling rules, each time you run the model, you remove or eliminate
the confounder contributing the least to the model. You decide which one
that is based on modeling rules you set (such as which confounder has the
largest p value). Theoretically, after you pare away the confounders that do
not meet the rules, you will have a final model. In practice, this process can
run into problems if you have collinear covariates (see Chapters 17 and 18 for
a discussions of collinearity). Your first model — filled with all your potential
covariates — may error out for this reason, and not converge. Also, it is not
clear whether once you eliminate a covariate you should try it again in the
model. This approach often sounds better on paper than it works in practice.»
» Stepwise selection: This approach combines the best of forward stepwise
and backward elimination. Starting with the same set of candidate covariates,
you choose which covariate to introduce first into a model with the exposure.
If this covariate meets modeling rules, it is kept, and if not, it is left out. This
continues along as if you are doing forward stepwise — but then, there’s a
twist. After you are done trying each covariate and you have your forward
stepwise model, you go back and try to add back the covariates you left out
one by one. Each time one seems to fit back in, you keep it and consider it part
of the working model. It is during this phase that collinearity between covariates
can become very apparent. After you try back the covariates you originally left
out and are satisfied that you were able to add back the ones that fit the
modeling rules, you can declare that you have arrived at the final model.
Once you produce your final model, check the p value for the covariate or covari-
ates representing your exposure. If they are not statistically significant, it means
that your hypothesis was incorrect, and after controlling for confounding, your
exposure was not statistically significantly associated with the outcome. However,
if the p value is statistically significant, then you would move on to interpret the
results for your exposure covariates from your regression model. After controlling
for confounding, your exposure was statistically significantly associated with
your outcome. Yay!
Use a spreadsheet to keep track of each model you run and a summary of the
results. Save this in addition to your computer code for running the models. It can
help you communicate with others about why certain covariates were retained and
not retained in your final model.